Overview
Brought to you by YData
Dataset statistics
| Number of variables | 20 |
|---|---|
| Number of observations | 50000 |
| Missing cells | 50134 |
| Missing cells (%) | 5.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 33.7 MiB |
| Average record size in memory | 707.0 B |
Variable types
| Text | 4 |
|---|---|
| Categorical | 5 |
| Numeric | 11 |
Aromaticity is highly overall correlated with Oxidized_coefficient and 1 other fields | High correlation |
Function_Prediction_source is highly overall correlated with Protein_source | High correlation |
Function_prediction_source is highly overall correlated with Phage_source and 1 other fields | High correlation |
Molecular_weight is highly overall correlated with Oxidized_coefficient and 1 other fields | High correlation |
Oxidized_coefficient is highly overall correlated with Aromaticity and 2 other fields | High correlation |
Phage_source is highly overall correlated with Function_prediction_source and 1 other fields | High correlation |
Protein_source is highly overall correlated with Function_Prediction_source and 2 other fields | High correlation |
Reduced_coefficient is highly overall correlated with Aromaticity and 2 other fields | High correlation |
Start is highly overall correlated with Stop | High correlation |
Stop is highly overall correlated with Start | High correlation |
Protein_source is highly imbalanced (93.8%) | Imbalance |
Function_prediction_source has 22743 (45.5%) missing values | Missing |
Function_Prediction_source has 27257 (54.5%) missing values | Missing |
Protein_ID has unique values | Unique |
Aromaticity has 8299 (16.6%) zeros | Zeros |
Instability_index has 827 (1.7%) zeros | Zeros |
Helix_fraction has 2146 (4.3%) zeros | Zeros |
Turn_fraction has 3063 (6.1%) zeros | Zeros |
Sheet_fraction has 2487 (5.0%) zeros | Zeros |
Reduced_coefficient has 13787 (27.6%) zeros | Zeros |
Oxidized_coefficient has 13261 (26.5%) zeros | Zeros |
Reproduction
| Analysis started | 2025-07-15 20:28:51.795512 |
|---|---|
| Analysis finished | 2025-07-15 20:29:05.993714 |
| Duration | 14.2 seconds |
| Software version | ydata-profiling v0.0.dev0 |
| Download configuration | config.json |
Variables
Phage_ID
Text
| Distinct | 47819 |
|---|---|
| Distinct (%) | 95.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.4 MiB |
Length
| Max length | 87 |
|---|---|
| Median length | 85 |
| Mean length | 34.49438 |
| Min length | 5 |
Unique
| Unique | 45753 ? |
|---|---|
| Unique (%) | 91.5% |
Sample
| 1st row | NC_001416.1 |
|---|---|
| 2nd row | NC_001629.1 |
| 3rd row | NC_001825.1 |
| 4th row | NC_001902.1 |
| 5th row | NC_001271.1 |
| Value | Count | Frequency (%) |
| nc_048047.1 | 6 | < 0.1% |
| mgv-genome-0357750 | 4 | < 0.1% |
| nc_042047.1 | 4 | < 0.1% |
| imgvr_uvig_3300045988_177519|3300045988|ga0495776_003599 | 4 | < 0.1% |
| uvig_458124 | 4 | < 0.1% |
| mgv-genome-0379300 | 4 | < 0.1% |
| uvig_25220 | 4 | < 0.1% |
| mgv-genome-0376919 | 4 | < 0.1% |
| mgv-genome-0376593 | 4 | < 0.1% |
| imgvr_uvig_3300007222_000012|3300007222|ga0104061_100089 | 3 | < 0.1% |
| Other values (47809) | 49959 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 189862 | 11.0% |
| _ | 137375 | 8.0% |
| 3 | 106332 | 6.2% |
| 1 | 90559 | 5.3% |
| 2 | 84027 | 4.9% |
| 8 | 82093 | 4.8% |
| 5 | 79942 | 4.6% |
| 4 | 78529 | 4.6% |
| 9 | 73036 | 4.2% |
| 7 | 70932 | 4.1% |
| Other values (56) | 732032 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1724719 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 189862 | 11.0% |
| _ | 137375 | 8.0% |
| 3 | 106332 | 6.2% |
| 1 | 90559 | 5.3% |
| 2 | 84027 | 4.9% |
| 8 | 82093 | 4.8% |
| 5 | 79942 | 4.6% |
| 4 | 78529 | 4.6% |
| 9 | 73036 | 4.2% |
| 7 | 70932 | 4.1% |
| Other values (56) | 732032 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1724719 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 189862 | 11.0% |
| _ | 137375 | 8.0% |
| 3 | 106332 | 6.2% |
| 1 | 90559 | 5.3% |
| 2 | 84027 | 4.9% |
| 8 | 82093 | 4.8% |
| 5 | 79942 | 4.6% |
| 4 | 78529 | 4.6% |
| 9 | 73036 | 4.2% |
| 7 | 70932 | 4.1% |
| Other values (56) | 732032 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1724719 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 189862 | 11.0% |
| _ | 137375 | 8.0% |
| 3 | 106332 | 6.2% |
| 1 | 90559 | 5.3% |
| 2 | 84027 | 4.9% |
| 8 | 82093 | 4.8% |
| 5 | 79942 | 4.6% |
| 4 | 78529 | 4.6% |
| 9 | 73036 | 4.2% |
| 7 | 70932 | 4.1% |
| Other values (56) | 732032 |
Protein_source
Categorical
High correlation  Imbalance 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.1 MiB |
| prodigal | |
|---|---|
| RefSeq | 567 |
| Genbank | 256 |
| DDBJ | 22 |
| EMBL | 13 |
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 7.9694 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | RefSeq |
|---|---|
| 2nd row | RefSeq |
| 3rd row | RefSeq |
| 4th row | RefSeq |
| 5th row | RefSeq |
Common Values
| Value | Count | Frequency (%) |
| prodigal | 49142 | |
| RefSeq | 567 | 1.1% |
| Genbank | 256 | 0.5% |
| DDBJ | 22 | < 0.1% |
| EMBL | 13 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| prodigal | 49142 | |
| refseq | 567 | 1.1% |
| genbank | 256 | 0.5% |
| ddbj | 22 | < 0.1% |
| embl | 13 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 49398 | |
| r | 49142 | |
| p | 49142 | |
| o | 49142 | |
| d | 49142 | |
| i | 49142 | |
| g | 49142 | |
| l | 49142 | |
| e | 1390 | 0.3% |
| R | 567 | 0.1% |
| Other values (13) | 3121 | 0.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 398470 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 49398 | |
| r | 49142 | |
| p | 49142 | |
| o | 49142 | |
| d | 49142 | |
| i | 49142 | |
| g | 49142 | |
| l | 49142 | |
| e | 1390 | 0.3% |
| R | 567 | 0.1% |
| Other values (13) | 3121 | 0.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 398470 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 49398 | |
| r | 49142 | |
| p | 49142 | |
| o | 49142 | |
| d | 49142 | |
| i | 49142 | |
| g | 49142 | |
| l | 49142 | |
| e | 1390 | 0.3% |
| R | 567 | 0.1% |
| Other values (13) | 3121 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 398470 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 49398 | |
| r | 49142 | |
| p | 49142 | |
| o | 49142 | |
| d | 49142 | |
| i | 49142 | |
| g | 49142 | |
| l | 49142 | |
| e | 1390 | 0.3% |
| R | 567 | 0.1% |
| Other values (13) | 3121 | 0.8% |
Function_prediction_source
Categorical
High correlation  Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 22743 |
| Missing (%) | 45.5% |
| Memory size | 3.0 MiB |
| eggNOG-mapper | |
|---|---|
| Iterative search | |
| - | |
| RefSeq | 567 |
| Genbank | 256 |
| Other values (2) | 35 |
Length
| Max length | 16 |
|---|---|
| Median length | 13 |
| Mean length | 11.444583 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | RefSeq |
|---|---|
| 2nd row | RefSeq |
| 3rd row | RefSeq |
| 4th row | RefSeq |
| 5th row | RefSeq |
Common Values
| Value | Count | Frequency (%) |
| eggNOG-mapper | 10996 | |
| Iterative search | 9884 | |
| - | 5519 | 11.0% |
| RefSeq | 567 | 1.1% |
| Genbank | 256 | 0.5% |
| DDBJ | 22 | < 0.1% |
| EMBL | 13 | < 0.1% |
| (Missing) | 22743 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| eggnog-mapper | 10996 | |
| iterative | 9884 | |
| search | 9884 | |
| 5519 | ||
| refseq | 567 | 1.5% |
| genbank | 256 | 0.7% |
| ddbj | 22 | 0.1% |
| embl | 13 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 53034 | |
| a | 31020 | 9.9% |
| r | 30764 | 9.9% |
| g | 21992 | 7.0% |
| p | 21992 | 7.0% |
| t | 19768 | 6.3% |
| - | 16515 | 5.3% |
| G | 11252 | 3.6% |
| m | 10996 | 3.5% |
| N | 10996 | 3.5% |
| Other values (21) | 83616 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 311945 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 53034 | |
| a | 31020 | 9.9% |
| r | 30764 | 9.9% |
| g | 21992 | 7.0% |
| p | 21992 | 7.0% |
| t | 19768 | 6.3% |
| - | 16515 | 5.3% |
| G | 11252 | 3.6% |
| m | 10996 | 3.5% |
| N | 10996 | 3.5% |
| Other values (21) | 83616 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 311945 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 53034 | |
| a | 31020 | 9.9% |
| r | 30764 | 9.9% |
| g | 21992 | 7.0% |
| p | 21992 | 7.0% |
| t | 19768 | 6.3% |
| - | 16515 | 5.3% |
| G | 11252 | 3.6% |
| m | 10996 | 3.5% |
| N | 10996 | 3.5% |
| Other values (21) | 83616 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 311945 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 53034 | |
| a | 31020 | 9.9% |
| r | 30764 | 9.9% |
| g | 21992 | 7.0% |
| p | 21992 | 7.0% |
| t | 19768 | 6.3% |
| - | 16515 | 5.3% |
| G | 11252 | 3.6% |
| m | 10996 | 3.5% |
| N | 10996 | 3.5% |
| Other values (21) | 83616 |
Start
Real number (ℝ)
High correlation 
| Distinct | 34163 |
|---|---|
| Distinct (%) | 68.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29004.409 |
| Minimum | 1 |
|---|---|
| Maximum | 475561 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1417.95 |
| Q1 | 8983 |
| median | 20791.5 |
| Q3 | 37135.5 |
| 95-th percentile | 87222.05 |
| Maximum | 475561 |
| Range | 475560 |
| Interquartile range (IQR) | 28152.5 |
Descriptive statistics
| Standard deviation | 31420.546 |
|---|---|
| Coefficient of variation (CV) | 1.0833024 |
| Kurtosis | 15.777577 |
| Mean | 29004.409 |
| Median Absolute Deviation (MAD) | 13356.5 |
| Skewness | 3.0413867 |
| Sum | 1.4502204 × 109 |
| Variance | 9.872507 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 245 | 0.5% |
| 2 | 175 | 0.4% |
| 3 | 161 | 0.3% |
| 17781 | 9 | < 0.1% |
| 50 | 8 | < 0.1% |
| 10213 | 7 | < 0.1% |
| 1272 | 7 | < 0.1% |
| 868 | 7 | < 0.1% |
| 3446 | 7 | < 0.1% |
| 26335 | 7 | < 0.1% |
| Other values (34153) | 49367 |
| Value | Count | Frequency (%) |
| 1 | 245 | |
| 2 | 175 | |
| 3 | 161 | |
| 5 | 2 | < 0.1% |
| 6 | 2 | < 0.1% |
| 7 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 12 | 1 | < 0.1% |
| 13 | 1 | < 0.1% |
| 14 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 475561 | 1 | |
| 437464 | 1 | |
| 423051 | 1 | |
| 380045 | 1 | |
| 375840 | 1 | |
| 374983 | 1 | |
| 360500 | 1 | |
| 360257 | 1 | |
| 357355 | 1 | |
| 356721 | 1 |
Stop
Real number (ℝ)
High correlation 
| Distinct | 34455 |
|---|---|
| Distinct (%) | 68.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29693.313 |
| Minimum | 60 |
|---|---|
| Maximum | 475914 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 60 |
|---|---|
| 5-th percentile | 2049.95 |
| Q1 | 9712 |
| median | 21529 |
| Q3 | 37815 |
| 95-th percentile | 87792.65 |
| Maximum | 475914 |
| Range | 475854 |
| Interquartile range (IQR) | 28103 |
Descriptive statistics
| Standard deviation | 31426.256 |
|---|---|
| Coefficient of variation (CV) | 1.0583614 |
| Kurtosis | 15.793947 |
| Mean | 29693.313 |
| Median Absolute Deviation (MAD) | 13347 |
| Skewness | 3.0431639 |
| Sum | 1.4846656 × 109 |
| Variance | 9.8760958 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3690 | 7 | < 0.1% |
| 25677 | 7 | < 0.1% |
| 9299 | 7 | < 0.1% |
| 2020 | 7 | < 0.1% |
| 9035 | 7 | < 0.1% |
| 2488 | 7 | < 0.1% |
| 4558 | 7 | < 0.1% |
| 12925 | 7 | < 0.1% |
| 2502 | 6 | < 0.1% |
| 7254 | 6 | < 0.1% |
| Other values (34445) | 49932 |
| Value | Count | Frequency (%) |
| 60 | 1 | < 0.1% |
| 66 | 3 | |
| 69 | 3 | |
| 70 | 1 | < 0.1% |
| 71 | 1 | < 0.1% |
| 72 | 2 | |
| 73 | 1 | < 0.1% |
| 75 | 2 | |
| 76 | 1 | < 0.1% |
| 78 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 475914 | 1 | |
| 437634 | 1 | |
| 423518 | 1 | |
| 380653 | 1 | |
| 377492 | 1 | |
| 377142 | 1 | |
| 361126 | 1 | |
| 360385 | 1 | |
| 358218 | 1 | |
| 357122 | 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | + |
|---|---|
| 2nd row | + |
| 3rd row | + |
| 4th row | + |
| 5th row | + |
Common Values
| Value | Count | Frequency (%) |
| + | 25135 | |
| - | 24865 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 50000 |
Most occurring characters
| Value | Count | Frequency (%) |
| + | 25135 | |
| - | 24865 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 50000 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| + | 25135 | |
| - | 24865 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 50000 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| + | 25135 | |
| - | 24865 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 50000 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| + | 25135 | |
| - | 24865 |
Protein_ID
Text
Unique 
| Distinct | 50000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.5 MiB |
Length
| Max length | 90 |
|---|---|
| Median length | 86 |
| Mean length | 37.3604 |
| Min length | 8 |
Unique
| Unique | 50000 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | NP_040636.1 |
|---|---|
| 2nd row | NP_042327.1 |
| 3rd row | NP_044830.1 |
| 4th row | NP_046963.1 |
| 5th row | NP_052068.1 |
| Value | Count | Frequency (%) |
| yp_239040.1 | 1 | < 0.1% |
| biochar_5611_7 | 1 | < 0.1% |
| np_040636.1 | 1 | < 0.1% |
| np_042327.1 | 1 | < 0.1% |
| np_044830.1 | 1 | < 0.1% |
| np_046963.1 | 1 | < 0.1% |
| np_052068.1 | 1 | < 0.1% |
| np_061623.1 | 1 | < 0.1% |
| np_073699.1 | 1 | < 0.1% |
| np_463475.1 | 1 | < 0.1% |
| Other values (49990) | 49990 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 195699 | 10.5% |
| _ | 186517 | 10.0% |
| 3 | 117947 | 6.3% |
| 1 | 108031 | 5.8% |
| 2 | 97342 | 5.2% |
| 5 | 88737 | 4.8% |
| 4 | 88675 | 4.7% |
| 8 | 88258 | 4.7% |
| 9 | 78991 | 4.2% |
| 7 | 77675 | 4.2% |
| Other values (56) | 740148 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1868020 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 195699 | 10.5% |
| _ | 186517 | 10.0% |
| 3 | 117947 | 6.3% |
| 1 | 108031 | 5.8% |
| 2 | 97342 | 5.2% |
| 5 | 88737 | 4.8% |
| 4 | 88675 | 4.7% |
| 8 | 88258 | 4.7% |
| 9 | 78991 | 4.2% |
| 7 | 77675 | 4.2% |
| Other values (56) | 740148 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1868020 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 195699 | 10.5% |
| _ | 186517 | 10.0% |
| 3 | 117947 | 6.3% |
| 1 | 108031 | 5.8% |
| 2 | 97342 | 5.2% |
| 5 | 88737 | 4.8% |
| 4 | 88675 | 4.7% |
| 8 | 88258 | 4.7% |
| 9 | 78991 | 4.2% |
| 7 | 77675 | 4.2% |
| Other values (56) | 740148 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1868020 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 195699 | 10.5% |
| _ | 186517 | 10.0% |
| 3 | 117947 | 6.3% |
| 1 | 108031 | 5.8% |
| 2 | 97342 | 5.2% |
| 5 | 88737 | 4.8% |
| 4 | 88675 | 4.7% |
| 8 | 88258 | 4.7% |
| 9 | 78991 | 4.2% |
| 7 | 77675 | 4.2% |
| Other values (56) | 740148 |
Product
Text
| Distinct | 4015 |
|---|---|
| Distinct (%) | 8.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 4.0 MiB |
Length
| Max length | 1357 |
|---|---|
| Median length | 769 |
| Mean length | 26.1082 |
| Min length | 2 |
Unique
| Unique | 1764 ? |
|---|---|
| Unique (%) | 3.5% |
Sample
| 1st row | NinD protein |
|---|---|
| 2nd row | DNA polymerase |
| 3rd row | hypothetical protein |
| 4th row | terminase small subunit |
| 5th row | putative 0.6A protein |
| Value | Count | Frequency (%) |
| unknown | 19746 | 11.7% |
| protein | 12889 | 7.6% |
| of | 4841 | 2.9% |
| hypothetical | 4341 | 2.6% |
| the | 4085 | 2.4% |
| domain | 3773 | 2.2% |
| phage | 3275 | 1.9% |
| family | 2919 | 1.7% |
| dna | 2885 | 1.7% |
| to | 2200 | 1.3% |
| Other values (5428) | 108085 |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 129006 | 9.9% |
| 119054 | 9.1% | |
| e | 101038 | 7.7% |
| o | 99410 | 7.6% |
| i | 88632 | 6.8% |
| t | 83003 | 6.4% |
| a | 75631 | 5.8% |
| r | 57145 | 4.4% |
| s | 49094 | 3.8% |
| l | 47479 | 3.6% |
| Other values (70) | 455918 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1305410 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| n | 129006 | 9.9% |
| 119054 | 9.1% | |
| e | 101038 | 7.7% |
| o | 99410 | 7.6% |
| i | 88632 | 6.8% |
| t | 83003 | 6.4% |
| a | 75631 | 5.8% |
| r | 57145 | 4.4% |
| s | 49094 | 3.8% |
| l | 47479 | 3.6% |
| Other values (70) | 455918 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1305410 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| n | 129006 | 9.9% |
| 119054 | 9.1% | |
| e | 101038 | 7.7% |
| o | 99410 | 7.6% |
| i | 88632 | 6.8% |
| t | 83003 | 6.4% |
| a | 75631 | 5.8% |
| r | 57145 | 4.4% |
| s | 49094 | 3.8% |
| l | 47479 | 3.6% |
| Other values (70) | 455918 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1305410 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| n | 129006 | 9.9% |
| 119054 | 9.1% | |
| e | 101038 | 7.7% |
| o | 99410 | 7.6% |
| i | 88632 | 6.8% |
| t | 83003 | 6.4% |
| a | 75631 | 5.8% |
| r | 57145 | 4.4% |
| s | 49094 | 3.8% |
| l | 47479 | 3.6% |
| Other values (70) | 455918 |
| Distinct | 67 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.2 MiB |
Length
| Max length | 63 |
|---|---|
| Median length | 9 |
| Mean length | 10.48208 |
| Min length | 6 |
Unique
| Unique | 11 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | unsorted; |
|---|---|
| 2nd row | replication; |
| 3rd row | hypothetical; |
| 4th row | packaging; |
| 5th row | unsorted; |
| Value | Count | Frequency (%) |
| unsorted | 27529 | |
| hypothetical | 4342 | 8.7% |
| assembly | 3576 | 7.2% |
| replication | 2475 | 5.0% |
| infection | 2034 | 4.1% |
| packaging | 1723 | 3.4% |
| assembly;infection | 1604 | 3.2% |
| lysis | 1400 | 2.8% |
| integration | 1230 | 2.5% |
| regulation | 1060 | 2.1% |
| Other values (57) | 3027 | 6.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| ; | 54024 | |
| e | 50355 | |
| t | 49825 | |
| n | 47552 | |
| o | 43497 | 8.3% |
| s | 42397 | 8.1% |
| r | 35443 | 6.8% |
| u | 30588 | 5.8% |
| i | 30103 | 5.7% |
| d | 27726 | 5.3% |
| Other values (15) | 112594 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 524104 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| ; | 54024 | |
| e | 50355 | |
| t | 49825 | |
| n | 47552 | |
| o | 43497 | 8.3% |
| s | 42397 | 8.1% |
| r | 35443 | 6.8% |
| u | 30588 | 5.8% |
| i | 30103 | 5.7% |
| d | 27726 | 5.3% |
| Other values (15) | 112594 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 524104 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| ; | 54024 | |
| e | 50355 | |
| t | 49825 | |
| n | 47552 | |
| o | 43497 | 8.3% |
| s | 42397 | 8.1% |
| r | 35443 | 6.8% |
| u | 30588 | 5.8% |
| i | 30103 | 5.7% |
| d | 27726 | 5.3% |
| Other values (15) | 112594 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 524104 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| ; | 54024 | |
| e | 50355 | |
| t | 49825 | |
| n | 47552 | |
| o | 43497 | 8.3% |
| s | 42397 | 8.1% |
| r | 35443 | 6.8% |
| u | 30588 | 5.8% |
| i | 30103 | 5.7% |
| d | 27726 | 5.3% |
| Other values (15) | 112594 |
Molecular_weight
Real number (ℝ)
High correlation 
| Distinct | 44387 |
|---|---|
| Distinct (%) | 88.9% |
| Missing | 67 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4125.5155 |
| Minimum | 75.0666 |
|---|---|
| Maximum | 8913.4407 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 75.0666 |
|---|---|
| 5-th percentile | 424.4482 |
| Q1 | 2023.2453 |
| median | 4210.6971 |
| Q3 | 6223.9149 |
| 95-th percentile | 7677.612 |
| Maximum | 8913.4407 |
| Range | 8838.3741 |
| Interquartile range (IQR) | 4200.6696 |
Descriptive statistics
| Standard deviation | 2368.6116 |
|---|---|
| Coefficient of variation (CV) | 0.57413711 |
| Kurtosis | -1.2447258 |
| Mean | 4125.5155 |
| Median Absolute Deviation (MAD) | 2096.3227 |
| Skewness | -0.044836548 |
| Sum | 2.0599937 × 108 |
| Variance | 5610320.7 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 146.1876 | 99 | 0.2% |
| 131.1729 | 95 | 0.2% |
| 147.1293 | 91 | 0.2% |
| 89.0932 | 67 | 0.1% |
| 174.201 | 60 | 0.1% |
| 117.1463 | 58 | 0.1% |
| 105.0926 | 58 | 0.1% |
| 133.1027 | 53 | 0.1% |
| 75.0666 | 45 | 0.1% |
| 146.1445 | 41 | 0.1% |
| Other values (44377) | 49266 | |
| (Missing) | 67 | 0.1% |
| Value | Count | Frequency (%) |
| 75.0666 | 45 | |
| 89.0932 | 67 | |
| 105.0926 | 58 | |
| 115.1305 | 13 | < 0.1% |
| 117.1463 | 58 | |
| 119.1192 | 19 | < 0.1% |
| 121.1582 | 7 | < 0.1% |
| 131.1729 | 95 | |
| 132.1179 | 29 | 0.1% |
| 133.1027 | 53 |
| Value | Count | Frequency (%) |
| 8913.4407 | 1 | |
| 8906.7809 | 1 | |
| 8766.9951 | 1 | |
| 8745.5145 | 1 | |
| 8722.928 | 1 | |
| 8711.0953 | 1 | |
| 8702.5438 | 1 | |
| 8696.8709 | 1 | |
| 8676.686 | 1 | |
| 8671.135 | 1 |
Aromaticity
Real number (ℝ)
High correlation  Zeros 
| Distinct | 470 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.090285378 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 8299 |
| Zeros (%) | 16.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.041666667 |
| median | 0.083333333 |
| Q3 | 0.125 |
| 95-th percentile | 0.2 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.083333333 |
Descriptive statistics
| Standard deviation | 0.080309898 |
|---|---|
| Coefficient of variation (CV) | 0.88951167 |
| Kurtosis | 30.458213 |
| Mean | 0.090285378 |
| Median Absolute Deviation (MAD) | 0.041666667 |
| Skewness | 3.5284955 |
| Sum | 4514.2689 |
| Variance | 0.0064496797 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 8299 | 16.6% |
| 0.1428571429 | 1047 | 2.1% |
| 0.1 | 1011 | 2.0% |
| 0.125 | 999 | 2.0% |
| 0.1111111111 | 973 | 1.9% |
| 0.09090909091 | 958 | 1.9% |
| 0.1666666667 | 826 | 1.7% |
| 0.07692307692 | 800 | 1.6% |
| 0.08333333333 | 782 | 1.6% |
| 0.07142857143 | 729 | 1.5% |
| Other values (460) | 33576 |
| Value | Count | Frequency (%) |
| 0 | 8299 | |
| 0.01428571429 | 20 | < 0.1% |
| 0.01449275362 | 17 | < 0.1% |
| 0.01470588235 | 18 | < 0.1% |
| 0.01492537313 | 16 | < 0.1% |
| 0.01515151515 | 22 | < 0.1% |
| 0.01538461538 | 28 | 0.1% |
| 0.015625 | 19 | < 0.1% |
| 0.01587301587 | 22 | < 0.1% |
| 0.01612903226 | 24 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 84 | |
| 0.6666666667 | 18 | < 0.1% |
| 0.6 | 8 | < 0.1% |
| 0.6 | 2 | < 0.1% |
| 0.5526315789 | 1 | < 0.1% |
| 0.5 | 178 | |
| 0.4666666667 | 3 | < 0.1% |
| 0.4615384615 | 1 | < 0.1% |
| 0.4545454545 | 1 | < 0.1% |
| 0.4444444444 | 6 | < 0.1% |
Instability_index
Real number (ℝ)
Zeros 
| Distinct | 39095 |
|---|---|
| Distinct (%) | 78.3% |
| Missing | 67 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.583318 |
| Minimum | -86.5 |
|---|---|
| Maximum | 388.53333 |
| Zeros | 827 |
| Zeros (%) | 1.7% |
| Negative | 3310 |
| Negative (%) | 6.6% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | -86.5 |
|---|---|
| 5-th percentile | -4.221746 |
| Q1 | 17.705405 |
| median | 33.626087 |
| Q3 | 50.304 |
| 95-th percentile | 82.6625 |
| Maximum | 388.53333 |
| Range | 475.03333 |
| Interquartile range (IQR) | 32.598595 |
Descriptive statistics
| Standard deviation | 29.020131 |
|---|---|
| Coefficient of variation (CV) | 0.81555438 |
| Kurtosis | 5.0529155 |
| Mean | 35.583318 |
| Median Absolute Deviation (MAD) | 16.294658 |
| Skewness | 1.0631055 |
| Sum | 1776781.8 |
| Variance | 842.16798 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 827 | 1.7% |
| 5 | 520 | 1.0% |
| 6.666666667 | 319 | 0.6% |
| 7.5 | 211 | 0.4% |
| 8 | 134 | 0.3% |
| -13.725 | 95 | 0.2% |
| 8.333333333 | 94 | 0.2% |
| -21.63333333 | 91 | 0.2% |
| 55.65 | 79 | 0.2% |
| -37.45 | 78 | 0.2% |
| Other values (39085) | 47485 |
| Value | Count | Frequency (%) |
| -86.5 | 1 | < 0.1% |
| -77.225 | 1 | < 0.1% |
| -74.83333333 | 2 | < 0.1% |
| -73 | 1 | < 0.1% |
| -72.525 | 3 | < 0.1% |
| -71.73333333 | 1 | < 0.1% |
| -70.15 | 1 | < 0.1% |
| -70.15 | 22 | |
| -69.1 | 3 | < 0.1% |
| -68.56666667 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 388.5333333 | 1 | < 0.1% |
| 291.4 | 4 | |
| 272.675 | 1 | < 0.1% |
| 263.4666667 | 1 | < 0.1% |
| 261.8 | 4 | |
| 260.8 | 1 | < 0.1% |
| 249.3166667 | 1 | < 0.1% |
| 247.625 | 1 | < 0.1% |
| 243.4571429 | 1 | < 0.1% |
| 241.5111111 | 1 | < 0.1% |
Isoelectric_point
Real number (ℝ)
| Distinct | 19764 |
|---|---|
| Distinct (%) | 39.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.8159599 |
| Minimum | 4.0500284 |
|---|---|
| Maximum | 11.999968 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 4.0500284 |
|---|---|
| 5-th percentile | 4.0500284 |
| Q1 | 4.6234201 |
| median | 6.0697451 |
| Q3 | 9.1382807 |
| 95-th percentile | 10.61255 |
| Maximum | 11.999968 |
| Range | 7.9499393 |
| Interquartile range (IQR) | 4.5148605 |
Descriptive statistics
| Standard deviation | 2.3580697 |
|---|---|
| Coefficient of variation (CV) | 0.34596297 |
| Kurtosis | -1.2739514 |
| Mean | 6.8159599 |
| Median Absolute Deviation (MAD) | 1.9132004 |
| Skewness | 0.41196447 |
| Sum | 340798 |
| Variance | 5.5604928 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4.050028419 | 4589 | 9.2% |
| 5.525000191 | 804 | 1.6% |
| 11.99996777 | 521 | 1.0% |
| 8.750052071 | 409 | 0.8% |
| 9.750021172 | 261 | 0.5% |
| 5.57001667 | 161 | 0.3% |
| 5.240009499 | 148 | 0.3% |
| 5.494989204 | 148 | 0.3% |
| 11.00083675 | 146 | 0.3% |
| 10.00273724 | 141 | 0.3% |
| Other values (19754) | 42672 |
| Value | Count | Frequency (%) |
| 4.050028419 | 4589 | |
| 4.05110836 | 1 | < 0.1% |
| 4.051335716 | 1 | < 0.1% |
| 4.051790428 | 1 | < 0.1% |
| 4.052586174 | 2 | < 0.1% |
| 4.052699852 | 1 | < 0.1% |
| 4.052984047 | 1 | < 0.1% |
| 4.053040886 | 1 | < 0.1% |
| 4.053097725 | 1 | < 0.1% |
| 4.053211403 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 11.99996777 | 521 | |
| 11.92157421 | 1 | < 0.1% |
| 11.91712589 | 1 | < 0.1% |
| 11.91532078 | 1 | < 0.1% |
| 11.91480503 | 2 | < 0.1% |
| 11.91254864 | 2 | < 0.1% |
| 11.91196842 | 1 | < 0.1% |
| 11.91042118 | 1 | < 0.1% |
| 11.90810032 | 1 | < 0.1% |
| 11.90552158 | 1 | < 0.1% |
Helix_fraction
Real number (ℝ)
Zeros 
| Distinct | 1165 |
|---|---|
| Distinct (%) | 2.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.2954387 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 2146 |
| Zeros (%) | 4.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.083333333 |
| Q1 | 0.23529412 |
| median | 0.296875 |
| Q3 | 0.35294118 |
| 95-th percentile | 0.5 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.11764706 |
Descriptive statistics
| Standard deviation | 0.12606572 |
|---|---|
| Coefficient of variation (CV) | 0.42670688 |
| Kurtosis | 6.2092713 |
| Mean | 0.2954387 |
| Median Absolute Deviation (MAD) | 0.058680556 |
| Skewness | 0.9435565 |
| Sum | 14771.935 |
| Variance | 0.015892567 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.3333333333 | 2507 | 5.0% |
| 0 | 2146 | 4.3% |
| 0.25 | 1566 | 3.1% |
| 0.2857142857 | 1141 | 2.3% |
| 0.5 | 1029 | 2.1% |
| 0.2 | 918 | 1.8% |
| 0.3 | 786 | 1.6% |
| 0.4 | 727 | 1.5% |
| 0.375 | 620 | 1.2% |
| 0.2727272727 | 608 | 1.2% |
| Other values (1155) | 37952 |
| Value | Count | Frequency (%) |
| 0 | 2146 | |
| 0.01923076923 | 2 | < 0.1% |
| 0.02173913043 | 2 | < 0.1% |
| 0.02222222222 | 5 | < 0.1% |
| 0.02272727273 | 5 | < 0.1% |
| 0.02380952381 | 1 | < 0.1% |
| 0.025 | 1 | < 0.1% |
| 0.02564102564 | 3 | < 0.1% |
| 0.02631578947 | 2 | < 0.1% |
| 0.02777777778 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 318 | |
| 0.875 | 2 | < 0.1% |
| 0.8571428571 | 2 | < 0.1% |
| 0.8571428571 | 4 | < 0.1% |
| 0.8421052632 | 1 | < 0.1% |
| 0.8333333333 | 1 | < 0.1% |
| 0.8333333333 | 2 | < 0.1% |
| 0.8 | 11 | < 0.1% |
| 0.7826086957 | 2 | < 0.1% |
| 0.75 | 45 | 0.1% |
Turn_fraction
Real number (ℝ)
Zeros 
| Distinct | 888 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.20626721 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 3063 |
| Zeros (%) | 6.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.14285714 |
| median | 0.2 |
| Q3 | 0.25641026 |
| 95-th percentile | 0.38461538 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.11355311 |
Descriptive statistics
| Standard deviation | 0.11427894 |
|---|---|
| Coefficient of variation (CV) | 0.55403347 |
| Kurtosis | 9.0603218 |
| Mean | 0.20626721 |
| Median Absolute Deviation (MAD) | 0.057142857 |
| Skewness | 1.6850346 |
| Sum | 10313.361 |
| Variance | 0.013059676 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 3063 | 6.1% |
| 0.25 | 1587 | 3.2% |
| 0.2 | 1535 | 3.1% |
| 0.1666666667 | 1380 | 2.8% |
| 0.3333333333 | 1104 | 2.2% |
| 0.1428571429 | 1043 | 2.1% |
| 0.2222222222 | 758 | 1.5% |
| 0.1818181818 | 689 | 1.4% |
| 0.2857142857 | 684 | 1.4% |
| 0.125 | 672 | 1.3% |
| Other values (878) | 37485 |
| Value | Count | Frequency (%) |
| 0 | 3063 | |
| 0.01886792453 | 2 | < 0.1% |
| 0.01923076923 | 2 | < 0.1% |
| 0.01960784314 | 1 | < 0.1% |
| 0.02040816327 | 1 | < 0.1% |
| 0.02083333333 | 1 | < 0.1% |
| 0.02127659574 | 1 | < 0.1% |
| 0.02173913043 | 1 | < 0.1% |
| 0.02272727273 | 2 | < 0.1% |
| 0.02380952381 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 180 | |
| 0.9090909091 | 1 | < 0.1% |
| 0.8888888889 | 1 | < 0.1% |
| 0.8484848485 | 1 | < 0.1% |
| 0.8333333333 | 3 | < 0.1% |
| 0.8 | 7 | < 0.1% |
| 0.75 | 35 | 0.1% |
| 0.7407407407 | 1 | < 0.1% |
| 0.7272727273 | 2 | < 0.1% |
| 0.724137931 | 2 | < 0.1% |
Sheet_fraction
Real number (ℝ)
Zeros 
| Distinct | 994 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.25590281 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 2487 |
| Zeros (%) | 5.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.031202652 |
| Q1 | 0.18634214 |
| median | 0.25 |
| Q3 | 0.31818182 |
| 95-th percentile | 0.45 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.13183968 |
Descriptive statistics
| Standard deviation | 0.12700454 |
|---|---|
| Coefficient of variation (CV) | 0.49629989 |
| Kurtosis | 6.2659779 |
| Mean | 0.25590281 |
| Median Absolute Deviation (MAD) | 0.065789474 |
| Skewness | 1.2451369 |
| Sum | 12795.14 |
| Variance | 0.016130152 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2487 | 5.0% |
| 0.3333333333 | 1849 | 3.7% |
| 0.25 | 1749 | 3.5% |
| 0.2 | 1257 | 2.5% |
| 0.2857142857 | 871 | 1.7% |
| 0.5 | 868 | 1.7% |
| 0.1666666667 | 851 | 1.7% |
| 0.2222222222 | 728 | 1.5% |
| 0.1428571429 | 643 | 1.3% |
| 0.3 | 570 | 1.1% |
| Other values (984) | 38127 |
| Value | Count | Frequency (%) |
| 0 | 2487 | |
| 0.01886792453 | 1 | < 0.1% |
| 0.02222222222 | 1 | < 0.1% |
| 0.02380952381 | 1 | < 0.1% |
| 0.0243902439 | 1 | < 0.1% |
| 0.02702702703 | 1 | < 0.1% |
| 0.02777777778 | 3 | < 0.1% |
| 0.02857142857 | 2 | < 0.1% |
| 0.02941176471 | 1 | < 0.1% |
| 0.0303030303 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 271 | |
| 0.8571428571 | 2 | < 0.1% |
| 0.8571428571 | 1 | < 0.1% |
| 0.8333333333 | 2 | < 0.1% |
| 0.8 | 20 | < 0.1% |
| 0.7857142857 | 1 | < 0.1% |
| 0.7777777778 | 1 | < 0.1% |
| 0.7692307692 | 1 | < 0.1% |
| 0.75 | 59 | 0.1% |
| 0.7272727273 | 1 | < 0.1% |
Reduced_coefficient
Real number (ℝ)
High correlation  Zeros 
| Distinct | 71 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4941.6168 |
| Minimum | 0 |
|---|---|
| Maximum | 60500 |
| Zeros | 13787 |
| Zeros (%) | 27.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2980 |
| Q3 | 7450 |
| 95-th percentile | 15470 |
| Maximum | 60500 |
| Range | 60500 |
| Interquartile range (IQR) | 7450 |
Descriptive statistics
| Standard deviation | 5502.9335 |
|---|---|
| Coefficient of variation (CV) | 1.1135897 |
| Kurtosis | 2.6408864 |
| Mean | 4941.6168 |
| Median Absolute Deviation (MAD) | 2980 |
| Skewness | 1.4943833 |
| Sum | 2.4708084 × 108 |
| Variance | 30282277 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 13787 | |
| 1490 | 8254 | |
| 2980 | 5001 | 10.0% |
| 6990 | 3030 | 6.1% |
| 4470 | 2881 | 5.8% |
| 5500 | 2844 | 5.7% |
| 8480 | 2521 | 5.0% |
| 9970 | 1594 | 3.2% |
| 5960 | 1531 | 3.1% |
| 12490 | 1027 | 2.1% |
| Other values (61) | 7530 |
| Value | Count | Frequency (%) |
| 0 | 13787 | |
| 1490 | 8254 | |
| 2980 | 5001 | 10.0% |
| 4470 | 2881 | 5.8% |
| 5500 | 2844 | 5.7% |
| 5960 | 1531 | 3.1% |
| 6990 | 3030 | 6.1% |
| 7450 | 688 | 1.4% |
| 8480 | 2521 | 5.0% |
| 8940 | 311 | 0.6% |
| Value | Count | Frequency (%) |
| 60500 | 1 | < 0.1% |
| 45490 | 1 | < 0.1% |
| 44000 | 1 | < 0.1% |
| 39990 | 2 | < 0.1% |
| 38500 | 2 | < 0.1% |
| 37470 | 1 | < 0.1% |
| 35980 | 2 | < 0.1% |
| 34490 | 8 | |
| 33920 | 1 | < 0.1% |
| 33460 | 4 |
Oxidized_coefficient
Real number (ℝ)
High correlation  Zeros 
| Distinct | 218 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4959.5943 |
| Minimum | 0 |
|---|---|
| Maximum | 60625 |
| Zeros | 13261 |
| Zeros (%) | 26.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 390.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2980 |
| Q3 | 7450 |
| 95-th percentile | 15720 |
| Maximum | 60625 |
| Range | 60625 |
| Interquartile range (IQR) | 7450 |
Descriptive statistics
| Standard deviation | 5512.5224 |
|---|---|
| Coefficient of variation (CV) | 1.1114866 |
| Kurtosis | 2.6293851 |
| Mean | 4959.5943 |
| Median Absolute Deviation (MAD) | 2980 |
| Skewness | 1.4917394 |
| Sum | 2.4797972 × 108 |
| Variance | 30387903 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 13261 | |
| 1490 | 7606 | |
| 2980 | 4343 | 8.7% |
| 6990 | 2640 | 5.3% |
| 5500 | 2608 | 5.2% |
| 4470 | 2470 | 4.9% |
| 8480 | 2110 | 4.2% |
| 9970 | 1318 | 2.6% |
| 5960 | 1227 | 2.5% |
| 12490 | 845 | 1.7% |
| Other values (208) | 11572 |
| Value | Count | Frequency (%) |
| 0 | 13261 | |
| 125 | 438 | 0.9% |
| 250 | 75 | 0.1% |
| 375 | 10 | < 0.1% |
| 500 | 2 | < 0.1% |
| 750 | 1 | < 0.1% |
| 1490 | 7606 | |
| 1615 | 522 | 1.0% |
| 1740 | 107 | 0.2% |
| 1865 | 18 | < 0.1% |
| Value | Count | Frequency (%) |
| 60625 | 1 | |
| 45490 | 1 | |
| 44000 | 1 | |
| 40365 | 1 | |
| 40115 | 1 | |
| 38625 | 1 | |
| 38500 | 1 | |
| 37470 | 1 | |
| 36105 | 1 | |
| 35980 | 1 |
Phage_source
Categorical
High correlation 
| Distinct | 14 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.9 MiB |
| IMG_VR | |
|---|---|
| MGV | |
| GPD | |
| GOV2 | |
| TemPhD | |
| Other values (9) |
Length
| Max length | 8 |
|---|---|
| Median length | 7 |
| Mean length | 4.35226 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | RefSeq |
|---|---|
| 2nd row | RefSeq |
| 3rd row | RefSeq |
| 4th row | RefSeq |
| 5th row | RefSeq |
Common Values
| Value | Count | Frequency (%) |
| IMG_VR | 14007 | |
| MGV | 12240 | |
| GPD | 8797 | |
| GOV2 | 5915 | |
| TemPhD | 4087 | 8.2% |
| CHVD | 2250 | 4.5% |
| GVD | 871 | 1.7% |
| RefSeq | 567 | 1.1% |
| PhagesDB | 404 | 0.8% |
| IGVD | 386 | 0.8% |
| Other values (4) | 476 | 1.0% |
Length
| Value | Count | Frequency (%) |
| img_vr | 14007 | |
| mgv | 12240 | |
| gpd | 8797 | |
| gov2 | 5915 | |
| temphd | 4087 | 8.2% |
| chvd | 2250 | 4.5% |
| gvd | 871 | 1.7% |
| refseq | 567 | 1.1% |
| phagesdb | 404 | 0.8% |
| igvd | 386 | 0.8% |
| Other values (4) | 476 | 1.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| G | 42472 | |
| V | 35854 | |
| M | 26260 | |
| D | 16839 | 7.7% |
| R | 14574 | 6.7% |
| I | 14393 | 6.6% |
| _ | 14007 | 6.4% |
| P | 13288 | 6.1% |
| O | 5915 | 2.7% |
| 2 | 5915 | 2.7% |
| Other values (19) | 28096 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 217613 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| G | 42472 | |
| V | 35854 | |
| M | 26260 | |
| D | 16839 | 7.7% |
| R | 14574 | 6.7% |
| I | 14393 | 6.6% |
| _ | 14007 | 6.4% |
| P | 13288 | 6.1% |
| O | 5915 | 2.7% |
| 2 | 5915 | 2.7% |
| Other values (19) | 28096 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 217613 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| G | 42472 | |
| V | 35854 | |
| M | 26260 | |
| D | 16839 | 7.7% |
| R | 14574 | 6.7% |
| I | 14393 | 6.6% |
| _ | 14007 | 6.4% |
| P | 13288 | 6.1% |
| O | 5915 | 2.7% |
| 2 | 5915 | 2.7% |
| Other values (19) | 28096 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 217613 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| G | 42472 | |
| V | 35854 | |
| M | 26260 | |
| D | 16839 | 7.7% |
| R | 14574 | 6.7% |
| I | 14393 | 6.6% |
| _ | 14007 | 6.4% |
| P | 13288 | 6.1% |
| O | 5915 | 2.7% |
| 2 | 5915 | 2.7% |
| Other values (19) | 28096 |
Function_Prediction_source
Categorical
High correlation  Missing 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 27257 |
| Missing (%) | 54.5% |
| Memory size | 2.8 MiB |
| - | |
|---|---|
| eggNOG-mapper | |
| Iterative search |
Length
| Max length | 16 |
|---|---|
| Median length | 1 |
| Mean length | 6.6480675 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | eggNOG-mapper |
|---|---|
| 2nd row | eggNOG-mapper |
| 3rd row | eggNOG-mapper |
| 4th row | eggNOG-mapper |
| 5th row | eggNOG-mapper |
Common Values
| Value | Count | Frequency (%) |
| - | 12448 | |
| eggNOG-mapper | 8657 | 17.3% |
| Iterative search | 1638 | 3.3% |
| (Missing) | 27257 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 12448 | ||
| eggnog-mapper | 8657 | |
| iterative | 1638 | 6.7% |
| search | 1638 | 6.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 22228 | |
| - | 21105 | |
| g | 17314 | |
| p | 17314 | |
| a | 11933 | |
| r | 11933 | |
| G | 8657 | 5.7% |
| O | 8657 | 5.7% |
| N | 8657 | 5.7% |
| m | 8657 | 5.7% |
| Other values (8) | 14742 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 151197 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 22228 | |
| - | 21105 | |
| g | 17314 | |
| p | 17314 | |
| a | 11933 | |
| r | 11933 | |
| G | 8657 | 5.7% |
| O | 8657 | 5.7% |
| N | 8657 | 5.7% |
| m | 8657 | 5.7% |
| Other values (8) | 14742 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 151197 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 22228 | |
| - | 21105 | |
| g | 17314 | |
| p | 17314 | |
| a | 11933 | |
| r | 11933 | |
| G | 8657 | 5.7% |
| O | 8657 | 5.7% |
| N | 8657 | 5.7% |
| m | 8657 | 5.7% |
| Other values (8) | 14742 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 151197 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 22228 | |
| - | 21105 | |
| g | 17314 | |
| p | 17314 | |
| a | 11933 | |
| r | 11933 | |
| G | 8657 | 5.7% |
| O | 8657 | 5.7% |
| N | 8657 | 5.7% |
| m | 8657 | 5.7% |
| Other values (8) | 14742 |
Interactions
Correlations
| Aromaticity | Function_Prediction_source | Function_prediction_source | Helix_fraction | Instability_index | Isoelectric_point | Molecular_weight | Oxidized_coefficient | Phage_source | Protein_source | Reduced_coefficient | Sheet_fraction | Start | Stop | Strand | Turn_fraction | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Aromaticity | 1.000 | 0.009 | 0.017 | 0.461 | -0.015 | -0.011 | 0.194 | 0.595 | 0.023 | 0.003 | 0.599 | -0.233 | 0.038 | 0.037 | 0.009 | -0.032 |
| Function_Prediction_source | 0.009 | 1.000 | 0.000 | 0.043 | 0.024 | 0.037 | 0.040 | 0.008 | 0.329 | 1.000 | 0.006 | 0.050 | 0.066 | 0.063 | 0.019 | 0.059 |
| Function_prediction_source | 0.017 | 0.000 | 1.000 | 0.037 | 0.011 | 0.025 | 0.018 | 0.000 | 0.822 | 1.000 | 0.000 | 0.007 | 0.093 | 0.092 | 0.051 | 0.031 |
| Helix_fraction | 0.461 | 0.043 | 0.037 | 1.000 | -0.137 | -0.048 | 0.064 | 0.245 | 0.017 | 0.000 | 0.249 | -0.072 | 0.050 | 0.048 | 0.006 | -0.193 |
| Instability_index | -0.015 | 0.024 | 0.011 | -0.137 | 1.000 | -0.038 | 0.172 | 0.081 | 0.004 | 0.000 | 0.076 | 0.162 | -0.002 | -0.003 | 0.008 | 0.025 |
| Isoelectric_point | -0.011 | 0.037 | 0.025 | -0.048 | -0.038 | 1.000 | 0.058 | 0.031 | 0.023 | 0.011 | 0.032 | -0.272 | -0.002 | -0.002 | 0.000 | -0.009 |
| Molecular_weight | 0.194 | 0.040 | 0.018 | 0.064 | 0.172 | 0.058 | 1.000 | 0.627 | 0.011 | 0.000 | 0.620 | 0.044 | -0.001 | -0.002 | 0.000 | 0.031 |
| Oxidized_coefficient | 0.595 | 0.008 | 0.000 | 0.245 | 0.081 | 0.031 | 0.627 | 1.000 | 0.007 | 0.000 | 0.998 | -0.117 | 0.008 | 0.007 | 0.000 | 0.016 |
| Phage_source | 0.023 | 0.329 | 0.822 | 0.017 | 0.004 | 0.023 | 0.011 | 0.007 | 1.000 | 1.000 | 0.007 | 0.023 | 0.072 | 0.072 | 0.063 | 0.019 |
| Protein_source | 0.003 | 1.000 | 1.000 | 0.000 | 0.000 | 0.011 | 0.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.005 | 0.073 | 0.072 | 0.038 | 0.000 |
| Reduced_coefficient | 0.599 | 0.006 | 0.000 | 0.249 | 0.076 | 0.032 | 0.620 | 0.998 | 0.007 | 0.000 | 1.000 | -0.115 | 0.008 | 0.006 | 0.000 | 0.015 |
| Sheet_fraction | -0.233 | 0.050 | 0.007 | -0.072 | 0.162 | -0.272 | 0.044 | -0.117 | 0.023 | 0.005 | -0.115 | 1.000 | -0.023 | -0.026 | 0.009 | -0.321 |
| Start | 0.038 | 0.066 | 0.093 | 0.050 | -0.002 | -0.002 | -0.001 | 0.008 | 0.072 | 0.073 | 0.008 | -0.023 | 1.000 | 0.999 | 0.000 | -0.012 |
| Stop | 0.037 | 0.063 | 0.092 | 0.048 | -0.003 | -0.002 | -0.002 | 0.007 | 0.072 | 0.072 | 0.006 | -0.026 | 0.999 | 1.000 | 0.000 | -0.007 |
| Strand | 0.009 | 0.019 | 0.051 | 0.006 | 0.008 | 0.000 | 0.000 | 0.000 | 0.063 | 0.038 | 0.000 | 0.009 | 0.000 | 0.000 | 1.000 | 0.000 |
| Turn_fraction | -0.032 | 0.059 | 0.031 | -0.193 | 0.025 | -0.009 | 0.031 | 0.016 | 0.019 | 0.000 | 0.015 | -0.321 | -0.012 | -0.007 | 0.000 | 1.000 |
Missing values
Sample
| Phage_ID | Protein_source | Function_prediction_source | Start | Stop | Strand | Protein_ID | Product | Protein_classification | Molecular_weight | Aromaticity | Instability_index | Isoelectric_point | Helix_fraction | Turn_fraction | Sheet_fraction | Reduced_coefficient | Oxidized_coefficient | Phage_source | Function_Prediction_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NC_001416.1 | RefSeq | RefSeq | 41950 | 42123 | + | NP_040636.1 | NinD protein | unsorted; | 6978.8276 | 0.105263 | 63.875439 | 6.183025 | 0.210526 | 0.228070 | 0.192982 | 19480 | 19730 | RefSeq | NaN |
| 1 | NC_001629.1 | RefSeq | RefSeq | 16550 | 16738 | + | NP_042327.1 | DNA polymerase | replication; | 7795.9223 | 0.096774 | 50.417742 | 9.660732 | 0.241935 | 0.129032 | 0.354839 | 6990 | 6990 | RefSeq | NaN |
| 2 | NC_001825.1 | RefSeq | RefSeq | 10787 | 11548 | + | NP_044830.1 | hypothetical protein | hypothetical; | 4674.0259 | 0.046512 | 46.130233 | 8.596811 | 0.186047 | 0.302326 | 0.139535 | 11000 | 11000 | RefSeq | NaN |
| 3 | NC_001902.1 | RefSeq | RefSeq | 4440 | 4961 | + | NP_046963.1 | terminase small subunit | packaging; | 3523.6074 | 0.000000 | 30.130303 | 4.050028 | 0.181818 | 0.212121 | 0.484848 | 0 | 0 | RefSeq | NaN |
| 4 | NC_001271.1 | RefSeq | RefSeq | 1758 | 1961 | + | NP_052068.1 | putative 0.6A protein | unsorted; | 7912.3506 | 0.194030 | 2.447910 | 10.502303 | 0.447761 | 0.134328 | 0.179104 | 11460 | 11460 | RefSeq | NaN |
| 5 | NC_002486.1 | RefSeq | RefSeq | 15075 | 15275 | + | NP_061623.1 | DUF1514 family protein | unsorted; | 7869.2770 | 0.090909 | 21.142424 | 9.205199 | 0.393939 | 0.136364 | 0.287879 | 11460 | 11460 | RefSeq | NaN |
| 6 | NC_002649.1 | RefSeq | RefSeq | 16590 | 17600 | + | NP_073699.1 | terminase | packaging; | 6530.5810 | 0.107143 | 12.292857 | 9.993905 | 0.357143 | 0.178571 | 0.232143 | 9970 | 9970 | RefSeq | NaN |
| 7 | NC_003216.1 | RefSeq | RefSeq | 8498 | 8920 | + | NP_463475.1 | tail assembly chaperone | assembly;infection; | 8141.2391 | 0.085714 | 53.127143 | 5.457078 | 0.271429 | 0.071429 | 0.314286 | 7450 | 7575 | RefSeq | NaN |
| 8 | NC_003216.1 | RefSeq | RefSeq | 21976 | 22425 | - | NP_463489.1 | anti-CRISPR protein AcrIIA1 | infection; | 1114.3356 | 0.000000 | -0.544444 | 8.497852 | 0.333333 | 0.111111 | 0.444444 | 0 | 0 | RefSeq | NaN |
| 9 | NC_003298.1 | RefSeq | RefSeq | 34775 | 35044 | + | NP_523344.1 | terminase small subunit | packaging; | 2187.3159 | 0.105263 | 5.315789 | 4.050028 | 0.315789 | 0.157895 | 0.157895 | 2980 | 2980 | RefSeq | NaN |
| Phage_ID | Protein_source | Function_prediction_source | Start | Stop | Strand | Protein_ID | Product | Protein_classification | Molecular_weight | Aromaticity | Instability_index | Isoelectric_point | Helix_fraction | Turn_fraction | Sheet_fraction | Reduced_coefficient | Oxidized_coefficient | Phage_source | Function_Prediction_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 49990 | biochar_4645 | prodigal | NaN | 9878 | 10351 | - | biochar_4645_15 | unknown | unsorted; | 1744.8678 | 0.000000 | 199.658824 | 11.999968 | 0.000000 | 0.470588 | 0.176471 | 0 | 0 | STV | - |
| 49991 | biochar_4678 | prodigal | NaN | 8287 | 8457 | - | biochar_4678_16 | unknown | unsorted; | 6529.1972 | 0.125000 | 33.925179 | 4.176893 | 0.357143 | 0.160714 | 0.303571 | 20970 | 20970 | STV | - |
| 49992 | biochar_4840 | prodigal | NaN | 7538 | 9232 | + | biochar_4840_12 | unknown | unsorted; | 428.4833 | 0.000000 | -13.725000 | 9.179992 | 0.000000 | 0.500000 | 0.000000 | 0 | 0 | STV | - |
| 49993 | biochar_5076 | prodigal | NaN | 5067 | 7295 | - | biochar_5076_8 | translation initiation factor activity | regulation; | 4110.4325 | 0.023810 | 67.423810 | 4.050028 | 0.119048 | 0.476190 | 0.238095 | 0 | 0 | STV | Iterative search |
| 49994 | biochar_5302 | prodigal | NaN | 5839 | 6039 | - | biochar_5302_8 | unknown | unsorted; | 7191.1912 | 0.030303 | 46.100000 | 9.915834 | 0.166667 | 0.242424 | 0.318182 | 1490 | 1490 | STV | - |
| 49995 | biochar_5324 | prodigal | NaN | 3761 | 4024 | + | biochar_5324_7 | unknown | unsorted; | 1876.0784 | 0.058824 | 33.547059 | 5.444857 | 0.176471 | 0.294118 | 0.235294 | 0 | 125 | STV | - |
| 49996 | biochar_5418 | prodigal | NaN | 7591 | 7872 | - | biochar_5418_17 | unknown | unsorted; | 2265.6500 | 0.000000 | 90.026087 | 8.747860 | 0.173913 | 0.347826 | 0.391304 | 0 | 0 | STV | - |
| 49997 | biochar_5440 | prodigal | NaN | 9755 | 10921 | + | biochar_5440_19 | head-tail adaptor | assembly;infection; | 4361.8676 | 0.179487 | 28.192308 | 6.913122 | 0.307692 | 0.307692 | 0.230769 | 8480 | 8480 | STV | Iterative search |
| 49998 | biochar_5583 | prodigal | NaN | 4937 | 5089 | + | biochar_5583_9 | unknown | unsorted; | 5802.6183 | 0.020000 | 44.792000 | 5.017599 | 0.340000 | 0.140000 | 0.320000 | 5500 | 5500 | STV | - |
| 49999 | biochar_5611 | prodigal | NaN | 4289 | 6148 | - | biochar_5611_7 | actin binding | replication; | 6568.9755 | 0.016949 | 50.081356 | 4.944731 | 0.101695 | 0.101695 | 0.338983 | 0 | 0 | STV | eggNOG-mapper |